New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMDB corruption recovery -- follow-up #3880
LMDB corruption recovery -- follow-up #3880
Conversation
This pull request introduces 10 alerts and fixes 25 when merging 3aebce9 into 4ebcf91 - view on LGTM.com new alerts:
fixed alerts:
|
a4eb259
to
3f1a347
Compare
This pull request introduces 12 alerts and fixes 25 when merging 3f1a347 into e9b38c1 - view on LGTM.com new alerts:
fixed alerts:
|
3f1a347
to
5f1b7b7
Compare
5f1b7b7
to
6a9bf01
Compare
b7c1c67
to
300930a
Compare
This pull request introduces 11 alerts and fixes 1 when merging 300930a into 7b2938d - view on LGTM.com new alerts:
fixed alerts:
|
137b4a5
to
e370fa5
Compare
This pull request introduces 11 alerts and fixes 1 when merging e370fa5 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
e370fa5
to
31f0b45
Compare
C doesn't support exceptions so in the LMDB corruption handling callback we have no way to backtrack to the code that run the LMDB operation that hit the corruption and tell it that things went wrong. Our only chance is to exit the process. To avoid our exit handlers from potentially messing up the freshly-repaired LMDB file, we have to either use _exit() or make sure the LMDB file is not touched by the exit handlers. This change implements the latter. We also use special exit codes to signal that LMDB corruption was detected and either repaired or failed to be reparied. Ticket: CFE-3127 Changelog: None
We need to make sure the file is repaired for future use.
31f0b45
to
b5ee6a3
Compare
This pull request introduces 11 alerts and fixes 1 when merging b5ee6a3 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
b5ee6a3
to
e36cf72
Compare
This pull request introduces 11 alerts and fixes 1 when merging e36cf72 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good! (Please make sure it passes jenkins before merging)
e36cf72
to
2e12ad9
Compare
This pull request introduces 11 alerts and fixes 1 when merging 2e12ad9 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
2e12ad9
to
9e3c789
Compare
This pull request introduces 11 alerts and fixes 1 when merging 9e3c789 into f3bad22 - view on LGTM.com new alerts:
fixed alerts:
|
SIGBUS means unaligned memory access or an attempt to access area outside of mmap(). In most cases for us this means an LMDB corruption. Thus if a process gets SIGBUS, it touches a flag file that triggers a DB repair attempt in the next agent or cf-execd run. Ticket: CFE-3127 Changelog: Received SIGBUS now triggers a repair of local DBs
They are more reusable there. Moved there in NorthernTechHQ/libntech@996300c.
repair_lmdb_file() is the function that repairs the given LMDB file and thus it should be the same function recording the repair timestamp. However, some complexities are out of its scope. Let's make it either take a file descriptor for the repair timestamp file, assuming that file locking and such stuff is taken care of, or do the simple repair timestamp recording on its own. Also treat its return code in a more appropriate way -- only a real failure should be considered failure. If the data recovery/replication fails, but the LMDB file is moved out of the way (removed), it should be considered as successful repair.
File locking in libntech is now a bit more abstract and versatile.
9e3c789
to
3817090
Compare
This pull request introduces 10 alerts and fixes 1 when merging 3817090 into 36379af - view on LGTM.com new alerts:
fixed alerts:
|
@cf-bottom run this throught he new shiny jenkins, please! Incl. exotics. |
Alright, I triggered a build: (with exotics) Jenkins: https://ci.cfengine.com/job/pr-pipeline/4006/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-4006/ |
@cf-bottom One more jenkins, with exotics. |
Sure, I triggered a build: (with exotics) Jenkins: https://ci.cfengine.com/job/pr-pipeline/4017/ Packages: http://buildcache.cfengine.com/packages/testing-pr/jenkins-pr-pipeline-4017/ |
Follows #3873
Merge together:
#3880
https://github.com/cfengine/enterprise/pull/556
NorthernTechHQ/libntech#27